Skip to content

Propose pattern for pipelines with multiple/optional correction steps #211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

SimonHeybrock
Copy link
Member

@SimonHeybrock SimonHeybrock commented May 1, 2025

This is a suggestion for a pipeline design pattern, addressing recurring needs for selecting one or more optional correction steps.

The notebook is a draft for gathering feedback on the idea. Will be cleaned up if positive.

See also #192, which I hope is not necessary if the suggested pattern works out.

@SimonHeybrock SimonHeybrock force-pushed the optional-correction-chains branch from 1ac947e to 124e910 Compare May 1, 2025 05:43
@SimonHeybrock SimonHeybrock marked this pull request as draft May 1, 2025 05:43
@jl-wynen
Copy link
Member

jl-wynen commented May 2, 2025

Nice idea. But this will lead to an exponential growth of functions in the number of corrections. This sounds like a maintenance problem.

We could mitigate this slightly by using something along the lines of

def apply_corrections(corrections: list) -> Callable[...]:
    # just a sketch, doing this requires unpacking the corrections list into separate arguments:
    def apply(raw_data: RawData, *corrections): return ....
    return apply

But this way, it would not be (easily) possible to find the actual function from the graph viz or JSON.

@SimonHeybrock
Copy link
Member Author

Nice idea. But this will lead to an exponential growth of functions in the number of corrections. This sounds like a maintenance problem.

I doubt there can be too many that are all applied at the same stage of a pipeline? What is the largest number you have encountered?

We could mitigate this slightly by using something along the lines of

def apply_corrections(corrections: list) -> Callable[...]:
    # just a sketch, doing this requires unpacking the corrections list into separate arguments:
    def apply(raw_data: RawData, *corrections): return ....
    return apply

But this way, it would not be (easily) possible to find the actual function from the graph viz or JSON.

Not sure I understand how your list approach works.

@jl-wynen
Copy link
Member

jl-wynen commented May 2, 2025

I doubt there can be too many that are all applied at the same stage of a pipeline? What is the largest number you have encountered?

Just 1 correction with 3 options in powder: the 'run normalisation'. In reflectometry, there are more corrections. @jokasimr How many are there?

We also have the pixel masks with an arbitrary number of masks. Can we also handle them with this approach instead of having a dedicated function that people have to apply to their pipeline?

Not sure I understand how your list approach works.

I was just brainstorming. But essentially:

wf.insert(apply_corrections([apply_a, apply_b]))

where apply_corrections([apply_a, apply_b]) creates and returns a function of the form

def apply(raw_data: RawData, correction_a: CorrectionA, correction_b: CorrectionB) -> CorrectedData:
    return apply_b(apply_a(raw_data, correction_a), correction_b)

@SimonHeybrock
Copy link
Member Author

I doubt there can be too many that are all applied at the same stage of a pipeline? What is the largest number you have encountered?

We also have the pixel masks with an arbitrary number of masks. Can we also handle them with this approach instead of having a dedicated function that people have to apply to their pipeline?

We usually used map/reduce for them? Are there other cases?

Not sure I understand how your list approach works.

I was just brainstorming. But essentially:

wf.insert(apply_corrections([apply_a, apply_b]))

where apply_corrections([apply_a, apply_b]) creates and returns a function of the form

def apply(raw_data: RawData, correction_a: CorrectionA, correction_b: CorrectionB) -> CorrectedData:
    return apply_b(apply_a(raw_data, correction_a), correction_b)

I am not sure you can auto-generate such a function that would work with Sciline?

@jokasimr
Copy link
Contributor

jokasimr commented May 2, 2025

Just 1 correction with 3 options in powder: the 'run normalisation'. In reflectometry, there are more corrections. @jokasimr How many are there?

If we are talking about scalar multiplicative corrections (that is, not polarization correction or background subtraction)
I don't think there are more corrections in reflectometry than for other techniques. For ESTIA there are:

  • Correction by virtual slit opening (essentially "footprint correction")
  • Correction by monitor (or proton current, or measurement time)
  • Correction by supermirror reflectivity (only for reference measurement)

If we are talking about "corrections" to coordinates (not to event weights). There might be more of those.

We also have the pixel masks with an arbitrary number of masks.

There are a number of masks applied, more than the number of corrections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants